Isilon OneFS: Event notification: Node Offline - Event ID: 200010001, 300010003, 399990001, 900160001, 910100006, 400150007

Summary: Isilon OneFS: Event notification: Node Offline - Event ID: 200010001, 300010003, 399990001, 900160001, 910100006, 400150007

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

Event

You receive a "Node Offline" event notification. Event ID: 200010001.

"Node Offline" events are generated when a node is reported offline by the other nodes in the cluster. This event can also be generated when the internal link is lost on any node.

NOTE: If the node is not turned on, then perform ‘How to power cycle and drain an Isilon node ’.

Cause

Details

One of the following conditions is true:

  • One or more nodes rebooted.
  • One or more nodes are powered off.
  • A node lacks back-end network (InfiniBand (IB)) connectivity. (Back-end connectivity refers to a node's ability to communicate with other nodes.)
  • A node cannot join the group.

Resolution

Response

Before you begin troubleshooting the issue, confirm that the event is not related to maintenance on the cluster. After confirming that no maintenance is in progress, proceed with the following troubleshooting.

If the node rebooted

  1. Open an SSH connection to the node and log on using the "root" account.
  2. Run the following command to confirm the node rejoined the cluster:

    isi status

    The isi status command returns output similar to the following. If the node successfully rejoined the cluster, the Health column will not display D (down):
     
                       Health  Throughput (bps) HDD Storage   SSD Storage
    ID |IP Address     |DASR |  In   Out  Total| Used / Size|Used / Size
    -------------------+-----+-----+-----+-----+-----------------+-----------------
      1|10.111.183.10  | OK  | 115K| 220K| 335K| 531M/  10T(< 1%)|    (No SSDs)
      2|10.111.183.11  | OK  |    0|    0|    0| 519M/  10T(< 1%)|    (No SSDs)
      3|10.111.183.12  | OK  |    0|  26K|  26K| 521M/  10T(< 1%)|    (No SSDs)
    -------------------+-----+-----+-----+-----+-----------------+-----------------
    Cluster Totals:          | 115K| 246K| 361K| 1.5G/  31T(< 1%)|    (No SSDs)

         Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

     
  3. Run the following command to confirm the uptime duration:

    uptime

    Output similar to the following appears:

    8:41PM up 10 mins, 1 user, load averages: 0.08, 0.18, 0.14

    If the node recently rebooted, the uptime duration will be relatively short, in minutes.
     
  4. Gather logs by running the following command and send them to Isilon Technical Support for analysis:

    isi_gather_info

If you can ping the external IP address of the down node

  1. Confirm the status of the node:
    1. Open an SSH connection to the node and log on using the "root" account.
    2. Run the following command:

      ifconfig |grep -A4 ib1

      The ifconfig command should return the following status indicating that the internal interface is active:
       
      ib1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 2004
      lladdr 0.15.1b.0.10.bd.4c.77
      inet 172.10.111.200 netmask 0xffffff00 broadcast 1.10.111.255 zone 1
      media: Infiniband autoselect
      status: active

       
  2. If the status is inactive, check the following:
    1. Are the activity lights for the ports on the IB card on or off?
      • If the lights are off, go to step b.
    2. Are the IB cables firmly attached to the node and the IB switch?
      • If not, reseat the cables on the node and the switch.
    3. Is the IB switch powered on?
      • If not, power it on.
    4. Visually inspect the node to verify that the power light is on.

If the node is turned off

  1. Attempt to turn on the node.
     

    NOTE
    It is best if you can establish serial access to the node to monitor as it boots up to capture any information that might assist in troubleshooting. For more information, see Isilon: How to connect to the management port of a node.

  2. If the node turns on, confirm whether it rejoined the cluster:
    1. Open a secure shell (SSH) connection to a different node in the cluster and log on using the root account.
    2. Run the following command to determine whether the node has rejoined the cluster:

      isi status

      The isi status command returns output similar to the following. If the node successfully rejoined the cluster, the Health column will not display D (down):
       
                         Health  Throughput (bps) HDD Storage   SSD Storage
      ID |IP Address     |DASR |  In   Out  Total| Used / Size|Used / Size
      -------------------+-----+-----+-----+-----+-----------------+-----------------
        1|10.111.183.10  | OK  | 115K| 220K| 335K| 531M/  10T(< 1%)|    (No SSDs)
        2|10.111.183.11  | OK  |    0|    0|    0| 519M/  10T(< 1%)|    (No SSDs)
        3|10.111.183.12  | OK  |    0|  26K|  26K| 521M/  10T(< 1%)|    (No SSDs)
      -------------------+-----+-----+-----+-----+-----------------+-----------------
      Cluster Totals:          | 115K| 246K| 361K| 1.5G/  31T(< 1%)|    (No SSDs)

           Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

       
    3. If the node rejoins the cluster, gather logs by running the following command and send them to Isilon Technical Support for analysis:

      isi_gather_info
       
    4. If the node does not rejoin the cluster, proceed to the next section.
  3. If the node does not turn on, ensure that the circuit breakers are operational and that the power outlets are active.
  4. If the node is not receiving power, resolve the power supply issue.
  5. If the node is off and it is receiving power, contact Isilon Technical Support for help troubleshooting the issue.

If the node is powered on but did not rejoin the cluster

  1. Attempt to establish remote access via a secure shell (SSH) session. If the SSH session fails, attempt to establish remote access via the serial console.
  2. If neither the SSH session nor the serial console are responsive, press CTRL+T either within the SSH session or on the serial console.
  3. If pressing CTRL+T produces output, record the output, and then contact Isilon Technical Support for failure analysis.
  4. If the node is unresponsive, press the power button three times and then wait five minutes for the node to power off.
  5. If the node does not power down, press and hold down the power button until the node powers off.
  6. Press the power button again to power on the node.
  7. If the node powers up and returns a login prompt, log on using the "root" account.
  8. Gather logs by running the following command and send them to Isilon technical support for analysis

    isi_gather_info
     
  9. If the node does not rejoin the cluster, contact Isilon Technical Support for help troubleshooting the issue.

Additional Information



 Event Id: 200010002 - NODE_STATUS_ONLINE

 Event Id: 200010003 - XTND_OFFLINE

 Event Id: 200010005 - DISKNODE_OFFLINE

 Event Id: 299990001 - NODE_COALESCE

 Event Id: 300020001 - RO_TRANS_FAILED

 Event Id: 300010002 - NODE_SHUTDOWN

 Event Id: 300020002 - NODE_REBOOT_JRNL_BKUP_FAIL

OneFS error: Could not recover journal
https://www.dell.com/support/kbdoc/32508

How to safely shut down an Isilon cluster prior to a scheduled power outage
https://www.dell.com/support/kbdoc/18989

 Event Id: 300010003 - BOOT_TIMEOUT

 Event Id: 399990001 - MAINT_REBOOT_COALESCE

 Event Id: 300020003 - MAINT_REBOOT_SHUTDOWN_FAILED

 Event Id: 300010001 - NODE_REBOOT

Affected Products

Isilon

Products

Isilon, PowerScale OneFS
Article Properties
Article Number: 000055936
Article Type: Solution
Last Modified: 29 Mar 2025
Version:  5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.